iT邦幫忙

2024 iThome 鐵人賽

DAY 10
1
AI/ ML & Data

輕鬆上手AI專案-影像分類到部署模型系列 第 10

[Day 10] 模型建構的方法 (2):函數式 API

  • 分享至 

  • xImage
  •  

前言

昨天介紹了序列式模型建構方式,今天要來介紹函數式 API。函數式 API 使用上較序列式自由度高,適合要建構比較複雜的模型,例如有分支結構,或是非線性拓樸的模型。多輸入和多輸出的模型,適合使用函數式 API 來建構。

函數式 API

函數式 API 的特色,會將層作為函數,資料作為輸入,和序列式模型一層一層堆疊的感覺略有不同:

# 序列式模型
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
# 或是另一種寫法
model = Sequential([
    Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
])
# 函數式 API
# 假設 x 為前一層的輸出
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(x)

可以從以上程式碼觀察到,函數式 API 會清楚表示將哪一個變數作為輸入,然後輸出的變數可以做其他層的輸入,所以可以重複使用某一些層,這樣的方法使用起來自由度很高,所以可以用來建構複雜結構的模型。

將昨天的序列式模型改寫成函數式 API,首先匯入所需類別:

from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, Flatten, Dense
from tensorflow.keras.models import Model

序列式模型程式碼改寫為函數式 API 程式碼:

input_layer = Input(shape=(256, 256, 3))
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=128, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = Conv2D(filters=512, kernel_size=(3, 3), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Flatten()(x)
x = Dense(units=4096, activation="relu")(x)
x = Dense(units=4096, activation="relu")(x)
output_layer = Dense(units=5, activation="softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer)

最後一行表示取得模型最終輸出,將一開始的輸入和最後的輸出指定好。

使用 summary() 查看模型結構:

model.summary()

執行結果:

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 256, 256, 64)      1792      
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 256, 256, 64)      36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 128, 128, 128)     73856     
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 128, 128, 128)     147584    
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 64, 64, 256)       295168    
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 64, 64, 256)       590080    
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 64, 64, 256)       590080    
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 32, 32, 512)       1180160   
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 32, 32, 512)       2359808   
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 32, 32, 512)       2359808   
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 16, 16, 512)       0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 16, 16, 512)       2359808   
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 16, 16, 512)       2359808   
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 16, 16, 512)       2359808   
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 8, 8, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 32768)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 4096)              134221824 
_________________________________________________________________
dense_4 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_5 (Dense)              (None, 5)                 20485     
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________

這裡的模型內容和昨天是一樣的。

在 Keras 中,有內建許多知名的神經網路模型函式,通常是使用 ImageNet 作為資料集來訓練。ImageNet 為一個大型影像資料庫,含有超過 1400 萬張的影像,包含了 2 萬多個類別,在影像辨識或分類等研究上擁有很大的貢獻,VGGNet、GoogLeNet(Inception)和 ResNet 等都是基於 ImageNet 所訓練。在 Keras 上的模型函式,包含 VGG、ResNet、Inception、Xception、MobileNet、DenseNet、NASNet、EfficientNet 和 ConvNeXt 等系列。

例如本系列要使用的 VGG16,可以直接呼叫內建的 VGG16() 函式來使用,就可以不用自己搭建。不過自行搭建也有好處,可以修改裡面的超參數,或是依照需求去做調整。

使用 VGG16() 函式的方法:

import tensorflow as tf
from tensorflow.keras.applications import VGG16

inputs = tf.keras.Input(shape=(224, 224, 3))
base_model = VGG16(include_top=True, weights='imagenet', input_tensor=inputs)
base_model = base_model.output
model = tf.keras.Model(inputs=inputs, outputs=base_model)

VGG16() 中的 include_top 設置為 True,表示模型包含全連接層,False 表示模型不包含全連接層,可以自行依照需求去搭建模型。這裡設定需要配合原本訓練 ImageNet 的設置,程式碼才不會執行錯誤,例如影像輸入大小為 (224, 224, 3)input_tensor 為輸入層。

使用 summary() 的執行結果:

model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

可以看到最後輸出為 1000,表示有 1000 個類別,這裡為當初 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 所使用的類別數,ILSVRC 為基於 ImageNet 資料集的一項競賽。

如果要依照本系列所找的資料集需求,就要將程式碼做一些修改。首先要先將 include_top 設定為 False,因為熊熊資料集(前情提要:本系列以 Bear dataset 為例,以後就稱呼它為熊熊資料集)總共有 5 個類別,所以不能用原本的 1000 個類別作為輸出的神經元。影像的大小,以 256×256 為例,所以在輸入層的影像大小也需要修改。程式碼如下:

import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers

inputs = tf.keras.Input(shape=(256, 256, 3))
base_model = VGG16(include_top=False, weights='imagenet', input_tensor=inputs)
x = base_model.output
x = layers.Flatten()(x)
x = layers.Dense(4096, activation="relu")(x)
x = layers.Dense(4096, activation="relu")(x)
outputs = layers.Dense(5, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

這裡的 weights 設定為 imagenet,表示使用 ImageNet 訓練的權重,也可以設定為 None,使用自己的資料集來重新訓練權重。接著將這個 VGG16 模型卷積層和池化層部分,連接到自己搭建的神經層,包含 1 個平坦層 Flatten 和 3 個全連接層 Dense,最後的全連接層設定成 5 個神經元,表示有 5 個分類,會輸出各個類別的機率值。

執行 model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_8 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 256, 256, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 256, 256, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 128, 128, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 128, 128, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 128, 128, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 64, 64, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 64, 64, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 32, 32, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 32, 32, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 16, 16, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 8, 8, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 32768)             0         
_________________________________________________________________
dense (Dense)                (None, 4096)              134221824 
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 20485     
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________

可以看見最後的全連接層被修改成符合我們資料集的輸出了。

加入資料增強層

最後加上 Day 8 介紹的資料增強層:

import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers

# use data augmentation layer
data_augmentation = tf.keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = tf.keras.Input(shape=(256, 256, 3))
base_model = data_augmentation(inputs)
base_model = VGG16(include_top=False, weights='imagenet', input_tensor=base_model)
x = base_model.output
x = layers.Flatten()(x)
x = layers.Dense(4096, activation="relu")(x)
x = layers.Dense(4096, activation="relu")(x)
outputs = layers.Dense(5, activation="softmax")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

執行 model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
sequential (Sequential)      (None, 256, 256, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 256, 256, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 256, 256, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 128, 128, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 128, 128, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 128, 128, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 64, 64, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 64, 64, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 64, 64, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 32, 32, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 32, 32, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 32, 32, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 16, 16, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 16, 16, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 8, 8, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 32768)             0         
_________________________________________________________________
dense (Dense)                (None, 4096)              134221824 
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 20485     
=================================================================
Total params: 165,738,309
Trainable params: 165,738,309
Non-trainable params: 0
_________________________________________________________________

輸入層下一層多出來的 sequential 就是資料增強層(名稱可自行修改),它已經成為模型的一部份了。

今天介紹了函數式 API 的寫法,感覺建構模型變得更有彈性了!明天會開始進入訓練模型的階段,方向盤緊抓不放啦~/images/emoticon/emoticon50.gif

參考資料


上一篇
[Day 9] 模型建構的方法 (1):序列式模型
下一篇
[Day 11] 訓練模型的方法:遷移學習
系列文
輕鬆上手AI專案-影像分類到部署模型14
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言